Back

Journal of Vision

Association for Research in Vision and Ophthalmology (ARVO)

Preprints posted in the last 7 days, ranked by how well they match Journal of Vision's content profile, based on 92 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
VOGeo-Gaze: Calibration-Free, Geometry-Aware Deep Learning for Real-Time Gaze Tracking in Clinical Video-Oculography

Zhao, J.; Ahmadi, S.-A.; Decker, J.; Zwergal, A.; Eulenburg, P. z.; Flanagin, V. L.; Wuehr, M.

2026-05-29 health informatics 10.64898/2026.05.27.26354254 medRxiv
Top 0.2%
1.9%
Show abstract

Quantitative eye movement analysis is important for neuro- logical diagnostics, yet existing video-oculography (VOG) systems typ- ically require calibration, device-specific settings, or accurate gaze la- bels. We present VOGeo-Gaze, a real-time, calibration-free, geometry- aware neural network that estimates gaze by reconstructing anatomi- cally meaningful eyeball parameters from image features. The method combines segmentation-driven projection geometry, a refraction-aware pupil correction module, and temporal anatomical stabilization, so gaze is derived from interpretable eye geometry rather than direct angular regression. Trained only on the public TEyeD dataset with weak gaze supervision, VOGeo-Gaze was evaluated on 116 clinical recordings from 17 patients and 19 healthy subjects using EyeSeeCam, a clinical gold- standard VOG system. It achieved median absolute angular errors of 0.33{whitebullet} horizontally and 0.35{whitebullet} vertically, with nearly 92% of recordings below 1{whitebullet} error while operating at >300 FPS. These results demonstrate sub-degree clinical gaze estimation without subject-specific calibration, camera intrinsics, or accurate gaze labels, providing a scalable and inter- pretable alternative to conventional VOG pipelines. Code is available at https://github.com/DSGZ-MotionLab/VOGeo-Gaze.

2
Developing and Evaluating Deep Learning Approaches for Visual Field Denoising in Glaucoma

Baek, J. S.; Lokhande, A.; Neuenschwander, D.; Shi, M.; Wang, M.

2026-06-01 ophthalmology 10.64898/2026.05.29.26354019 medRxiv
Top 0.3%
1.4%
Show abstract

Purpose To investigate the relative efficacy of nine distinct visual field (VF) denoising artificial intelligence (AI) methods and a pathology-aware AI strategy to discourage over-correction of glaucomatous defects. Design Retrospective study. Participants 87,940 paired visual field (VF) and optical coherence tomography (OCT) samples from a tertiary academic center. Methods Denoising models were trained on a separate VF-only dataset and evaluated on an independent structure-function dataset of paired VF-OCT samples. We implemented and evaluated nine distinct VF denoising strategies representing three broad categories: baseline measurements, self-supervised and image restoration models (including Noise2Noise, Noise2Void, and NAFNet), and latent variable compression-based models (autoencoders and variational autoencoders). All models were designed to reconstruct VF sensitivity maps. We then predicted retinal nerve fiber layer thickness (RNFLT) maps from the denoised VFs using a fixed, independently trained VF-to-RNFLT prediction model. Main Outcome Measures Predicted VF and RNFLT maps and resultant evaluation metrics. Results The raw VF baseline achieved a global R2 of 0.5468 and MAE of 16.83 um. Restoration-based models maintained or slightly improved concordance, with the pathology-aware NAFNet achieving the highest global R2 of 0.5485 and a comparable MAE of 16.82 um. In contrast, compression-based models degraded concordance, with CNN-VAE showing a significant reduction (R2 approximately 0.50). In severe glaucoma, concordance decreased across all methods; however, compression architectures exhibited disproportionately greater degradation compared with restoration-based approaches. Conclusions We present a comparative benchmark of AI-based VF denoising strategies paired with structure-function evaluation. While restoration-based models can reduce variability without loss of biological signal, latent compression risks attenuating clinically meaningful defects. Visually smoother fields are not necessarily more biologically accurate.

3
Deriving OCT-Equivalent Retinal Nerve Fiber Layer Thickness Maps from Fundus Photographs with Deep Learning Improves Glaucoma Diagnosis

Shi, L.; Shi, M.; Chung, I. Y.; Pasquale, L. R.; Shen, L. Q.; Wang, M.

2026-05-27 ophthalmology 10.64898/2026.05.26.26354047 medRxiv
Top 0.6%
0.5%
Show abstract

Purpose: To develop and evaluate a deep learning model that predicts optical coherence tomography (OCT)-equivalent retinal nerve fiber layer thickness (RNFLT) maps directly from color fundus photographs and to assess their diagnostic value for glaucoma detection. Design: Retrospective model development and evaluation study. Participants: 15,031 paired fundus photographs and spectral-domain OCT scans collected at Massachusetts Eye and Ear between 2011 and 2022. Methods: Paired fundus and OCT images were used to train a U-Net-based model to predict pixel-wise RNFLT maps with artifact-corrected supervision. Diagnostic performance was evaluated across single-modality models (fundus photos only, real RNFLT maps, predicted RNFLT maps) and multimodal fusion models (fundus + predicted RNFLT maps). Stratified analyses examined model performance across glaucoma severity and demographic subgroups. Glaucoma was defined based on standard criteria applied to Humphrey 24-2 visual field testing. Main Outcome Measures: Mean absolute error (MAE) and structural similarity index (SSIM) for RNFLT map prediction. Area under the ROC curve (AUC) and accuracy for glaucoma detection. Results: RNFLT map prediction achieved a MAE = 15.4 m and a SSIM = 0.65, measured against artifact-corrected RNFLT maps derived from OCT. For glaucoma detection, the predicted RNFLT-only classifier outperformed the fundus-only classifier (AUC 0.889 vs 0.883, p < 0.005; Accuracy 82.0% vs 78.0%), but performed worse than the real-RNFLT-only classifier (AUC 0.889 vs 0.903, p < 0.005). Multimodal fusion of fundus images with predicted RNFLT maps improved performance, achieving an AUC of 0.909, outperforming all single-modality inputs (p < 0.005 vs fundus-only, predicted-RNFLT-only, and real-RNFLT-only). Performance gains between the fundus-only and the multimodal classifier were greater in early-stage glaucoma compared to severe cases: accuracy increased from 55.3% to 64.0% in mild cases, from 71.5% to 80.4% in moderate cases, and from 90.0% to 94.6% in severe cases. Conclusions: Predicted RNFLT maps derived from fundus photographs provide quantitative, OCT-like structural information and improve glaucoma detection. Unlike prior work that predicted only summary RNFLT values, our model generates full RNFLT maps that better support glaucoma classification than fundus images alone. This approach offers a scalable pathway for early glaucoma screening and expands diagnostic access in resource-limited settings.

4
Deep Learning Prediction of Personalized Peripapillary Retinal Nerve Fiber Layer Thickness Norms from Fundus Images in Glaucoma

Yildiz, E.; Zha, L.; Zebardast, N.; Shi, M.; Wang, M.

2026-05-27 ophthalmology 10.64898/2026.05.26.26354081 medRxiv
Top 0.8%
0.3%
Show abstract

Purpose: To predict retinal nerve fiber layer thickness (RNFLT) norms from fundus images. Methods: We selected 18,000 OCT scans and visual fields (VF) from the Massachusetts Eye and Ear Glaucoma Service. A U-Net-based deep learning model was developed to predict RNFLT norms from OCT en face fundus images. A total of 10,000 OCT scans with normal VFs (mean deviation [MD] [&ge;] -1 dB, glaucoma hemifield test within normal limits, and pattern standard deviation probability > 5%) tested within 30 days were used for training, while the remaining 8,000 OCT scans (mean VF MD: 3.3 +/- 4.9 dB), including 2,419 scans with normal VFs, were used for evaluation. Structure-function correlations between RNFLT maps and VFs were assessed using linear regression and VGG-16 across original RNFLT maps, deviation maps, and their combination. Performance was evaluated using correlation coefficients, mean absolute error (MAE), and R-squared. Results: Predicted RNFLT norm maps showed agreement with baseline RNFLT maps in eyes with normal VFs (R-squared = 0.81 +/- 0.13). RNFLT deviation maps correlated more strongly with VF MD than original RNFLT maps (R = 0.42 vs. 0.19, p < 0.01). In deep learning-based VF prediction, combining original and deviation maps achieved the best performance (MAE = 3.31 dB, R-squared = 0.39), outperforming the model (p < 0.05) using original RNFLT maps alone (MAE = 3.36 dB, R-squared = 0.35). Conclusions: Deep learning can estimate individualized RNFLT norms and improve structure-function assessment in glaucoma. Translational Relevance: Personalized RNFLT norm prediction may improve detection of glaucomatous damage.

5
DISCERN: A Clinical Impact-aware Framework for Radiology Report Comparison

Sharma, R.; Beeche, C.; Dong, J.; Zhuang, R.; Qu, H.; Zhang, R.; Gangaram, V.; Goswami, P.; Xin, J.; Ballard, J.; Goldberg, A.; Sagreiya, H.; Long, Q.; Chen, T.; Witschey, W. R.

2026-05-27 radiology and imaging 10.64898/2026.05.26.26353612 medRxiv
Top 1%
0.1%
Show abstract

The surge in medical imaging has spurred the development of vision-language models (VLMs) to alleviate radiologist workloads. However, clinical deployment is hindered by the lack of meaningful evaluation frameworks. Current metrics - ranging from semantic similarity to large language model (LLM) based judges - often fail to distinguish between clinically trivial and critical discrepancies, poorly reflecting real-world clinical judgment. To address this, we introduce DISCERN (Discordance and Significance-aware Entity-level Radiology Report Comparison). DISCERN is a significance-aware framework that weighs report errors based on their potential impact on patient care. Our results demonstrate that DISCERN powered by closed source LLMs aligns more closely with expert radiologist assessments than traditional metrics or current LLM evaluators, providing a more interpretable and clinically relevant benchmark. By modeling radiologist prioritization and entity-level feedback, DISCERN facilitates targeted model refinement and ensures the safer integration of generative AI into clinical workflows.

6
Case-level artificial intelligence for multi-photo teledermatology submissions: development and internal validation using patient-submitted dermatology images

Patel, V. P.; Sheth, N.; Patel, A.; Patel, Y.

2026-06-01 dermatology 10.64898/2026.05.21.26353816 medRxiv
Top 1%
0.1%
Show abstract

Background: Store-and-forward teledermatology commonly relies on several patient-submitted photographs of the same concern, but most dermatology artificial intelligence models classify single images independently. Objective: To develop and internally validate a case-level diagnostic-support model that aggregates multiple patient-submitted photographs for common dermatologic conditions. Methods: We conducted a retrospective diagnostic-modeling study using the Skin Condition Image Network, a public dataset of deidentified self-taken dermatology images from US adults. We curated 2,336 cases comprising 5,041 images across 10 common inflammatory, allergic, and infectious conditions. Cases were split at the submission level into training, validation, and held-out test sets. Frozen general-purpose and dermatology-specific encoders were compared with image-level classifiers and a gated-attention multiple instance learning model that generated one case-level output from 1-3 images. Results: The strongest image-level baseline, dermatology-specific embeddings with random forest classification, achieved macro/micro ROC-AUCs of 0.797/0.854. Case-level aggregation improved discrimination, with dermatology-specific embeddings plus multiple instance learning achieving mean macro/micro ROC-AUCs of 0.819/0.863 across repeated stratified experiments. The locked final model achieved macro/micro ROC-AUCs of 0.800/0.849 on the held-out test set. Balanced-threshold sensitivity/specificity examples were 0.702/0.688 for eczema and 0.818/0.826 for urticaria. Limitations: Internal validation used a 10-condition subset from a US volunteer dataset; external validation, calibration, subgroup performance analysis, and prospective workflow studies are required. Conclusion: Modeling the teledermatology submission as a multi-image case better reflects asynchronous dermatology workflow than single-image classification. The model is preliminary clinician-facing support for structured review and triage, not autonomous diagnosis.

7
Prevalence and pattern of refractive errors among Yanomami Indigenous people in the Brazilian Amazon: a cross-sectional observational study

Chagas Ferreira, M. C.; Pellegrini, M. A.; Sequeira, B. J.

2026-05-26 ophthalmology 10.64898/2026.05.25.26354064 medRxiv
Top 2%
0.1%
Show abstract

Background: Refractive errors are the leading cause of preventable visual impairment worldwide, yet data from isolated Indigenous populations remain virtually absent from the global literature. The Yanomami, one of the largest Indigenous peoples in the Americas with recent and limited contact with non-Indigenous society, have no prior epidemiological data on refractive errors. Methods: A cross-sectional observational study was conducted in 2024 at the Yanomami Indigenous Health House, Boa Vista, Roraima, Brazil. A total of 158 self-identified Yanomami individuals aged 5 years or older were examined by an ophthalmologist. Refractive status was classified according to International Myopia Institute criteria. Results: Emmetropia was observed in 67.7% of participants, with a marked age-related decline from 100% in children aged 5 to 9 years to 38.6% in those aged 40 to 59 years. Myopia was present in 16.5% of participants, all low myopia; it was absent in children under 10 years and no high myopia was identified. Astigmatism affected 24.1% of participants and hyperopia 13.3%. Presbyopia was identified in 25.9%. Overall, 25.3% of participants presented with reduced visual acuity attributable to uncorrected refractive error, of whom 67.5% improved to normal or near-normal acuity (p < 0.001). Conclusions: This is the first characterisation of the Yanomami refractive profile, revealing a distinct myopia pattern shaped by high outdoor exposure and minimal near-work demands. Despite this, refractive correction remains effectively inaccessible to this population, leaving preventable visual impairment unaddressed and reflecting a profound health inequity. Corrective lens provision represents a high-impact, scalable intervention for this underserved community.

8
High Resolution Multi-depth Quantification of the Retinal Nerve Fiber Layer

Callet, C.; Bertrand, M.; Guzman, K.; Mece, P.; Rossi, E. A.; Grieve, K.

2026-06-01 ophthalmology 10.64898/2026.05.22.26353127 medRxiv
Top 2%
0.1%
Show abstract

The retinal nerve fiber layer, composed of axon bundles converging toward the optic nerve, is a key biomarker for diagnosing and monitoring glaucoma and other neurodegenerative diseases. High-resolution en face imaging of individual nerve fiber bundles offers morphological information beyond what conventional optical coherence tomography provides, yet clinical integration remains limited by the lack of automated analysis tools and normative data. Here, we imaged 14 healthy volunteers using time-domain full-field optical coherence tomography and adaptive optics scanning laser ophthalmoscopy, and developed automated pipelines to quantify bundle width, trajectory, tortuosity, and orientation. Bundles were on average 25% wider at shallower retinal depths, width measurements were consistent across imaging modalities, and estimated axon count per bundle decreased significantly with age. Global trajectory analysis revealed systematic deviations of high resolution data from existing mathematical models, particularly in the temporal sector, leading us to propose two refined trajectory models. These normative results provide a foundation for high resolution biomarkers for use in investigations of retinal neurodegeneration.

9
Safety and Biological Activity of Intravitreal OGX110, a CXCR3 Agonist, in Persistent Neovascular Age-Related Macular Degeneration: A Phase I Dose-Escalation Study

Wells, A.; Boyer, D.; Goldberg, R.; Hohman, T.; Maturi, R.; Patel, S.

2026-05-30 ophthalmology 10.64898/2026.05.21.26353430 medRxiv
Top 2%
0.1%
Show abstract

Purpose: To evaluate the safety and exploratory outcomes of a single intravitreal injection of OGX110, a peptide agonist of CXCR3, in eyes with persistent fluid secondary to neovascular age-related macular degeneration (nAMD) despite ongoing anti-vascular endothelial growth factor (anti-VEGF) therapy. Methods: This prospective, open-label, sequential dose-escalation phase I study (NCT05904691) enrolled subjects receiving standard-of-care intravitreal anti-VEGF therapy. Subjects received a single intravitreal injection of OGX110 at 0.5 mg, 1.0 mg, or 2.0 mg (n=3 per cohort), 7 to 14 days after the anti-VEGF injection. Results: All nine enrolled subjects completed follow-up through day 56. Two subjects (22%) experienced at least 1 adverse event (AE); all were mild and unrelated to study treatment. Exploratory analyses showed a BCVA change of +1.4 letters following anti-VEGF injection and +4.4 letters from OGX110 baseline to 4 weeks (P < 0.05). Six of 9 subjects gained at least 3 ETDRS letters after OGX110. Anatomic responses were heterogeneous. Four eyes showed a reduction in CRT after anti-VEGF injection that was maintained after OGX110 administration. One additional eye demonstrated a substantial reduction in CRT after OGX110 despite minimal response to anti-VEGF treatment. Conclusions: A single intravitreal injection of OGX110 was well tolerated. Exploratory functional and anatomic findings suggest biologic activity; interpretation is limited by small sample size, open-label design, absence of a concurrent control group, and inter-subject heterogeneity. These results support further study in a controlled trial. Translational Relevance: OGX110 represents a mechanistically distinct investigational approach for nAMD that may warrant further evaluation in eyes with persistent.

10
Automated Segmentation of Cerebral Arteries on Three-Dimensional Rotational Angiography Using nnUNet v2: Prospective Validation with Quantitative Metrics and Expert Qualitative Assessment

Hofmeister, J.; Brina, O.; Rosi, A.; Bernava, G.; Reymond, P.; Muster, M.; Lovblad, K.-O.; Machi, P.

2026-05-26 radiology and imaging 10.64898/2026.05.20.26353640 medRxiv
Top 2%
0.0%
Show abstract

Background: Three-dimensional visualization and quantitative analysis of cerebral arteries on 3DRA are central to endovascular treatment planning, device selection, and cerebrovascular research. Manual segmentation is time-consuming and operator-dependent, yet no open-source deep learning model has been prospectively validated for this task on 3DRA. Methods: A nnUNet v2 model was trained for binary cerebral artery segmentation on 400 consecutive 3DRA acquisitions from three angiographic systems, comparing four configurations across architectures and loss functions. The best-performing configurations were prospectively validated on 40 patients using a dual approach: quantitative metrics (DSC, clDice, HD95, ASD, Precision, Recall), and blinded expert qualitative evaluation by two interventional neuroradiologists assessing 12 arterial segments, a global quality score, and clinical usability across 40 test cases. Results: The ensemble model achieved median DSC 0.917, clDice 0.932, and HD95 1.494 mm. Global quality scores were significantly lower for nnUNet v2 than for expert segmentations (median 4 vs 5, p<0.001), but nnUNet v2 segmentations were rated clinically usable in 88-90% of cases versus 95-98% for expert segmentations, without significant difference on the binary usability criterion. A consistent proximal-to-distal quality gradient was identified, with comparable scores at proximal arteries and the largest differences at distal arterial segments. Conclusion: nnUNet v2 with topology-aware training provides clinically usable cerebral artery segmentations on 3DRA, prospectively validated through both quantitative metrics and structured expert qualitative assessment, and represents a reproducible open-source foundation for endovascular and research applications.

11
Can Large Language Models Diagnose Primary Immunodeficiency from Patient-Described Symptoms?

Reteig, L. C.; Woloshin, S.; Maglione, P. J.; Farmer, J. R.; Ong, M.-S.

2026-05-27 allergy and immunology 10.64898/2026.05.26.26353818 medRxiv
Top 2%
0.0%
Show abstract

Patients with primary immunodeficiency (PID) often face prolonged diagnostic delays and may increasingly turn to large language models (LLMs) to interpret their symptoms during this period. We evaluated whether an LLM could recognize PID from symptom descriptions derived from interviews with 21 PID patients. In a prior study, we showed that GPT-4o identified PID in 96% of cases when prompted with physician-written patient histories (Rider et al., JACI, 2024). Here, when prompted with symptom descriptions in patients' own words, GPT-5 identified PID in only 7 cases (33%), although it more broadly suggested immune system issues in 18 cases (81%). The gap between these findings indicates that LLMs are sensitive to the language and framing of symptom descriptions, performing substantially worse when patients describe their own symptoms in everyday language than when clinicians summarize patient histories in structured medical terms. This study underscores the need to carefully evaluate how LLMs are used in patient-facing applications.

12
Closed-Loop Quality Assurance for Production Clinical AI Documentation

Napier, A.; Wiley, J.; Heslin, M.

2026-05-29 health informatics 10.64898/2026.05.27.26353977 medRxiv
Top 3%
0.0%
Show abstract

A closed-loop quality system deployed across thirteen US hospital sites resolved physician complaints with zero regressions on 42 tracked cases across 1,089 optimization iterations, while a deterministic assembly-agent replacement cut H+P trace latency from 19.6 s to 10.8 s (-8.8 s, 95% CI [-10.5, -7.1] s; n = 100 pre, n = 100 post). We report four observations and an architectural follow-through. First, the same binary-check instrument produces opposite outcomes depending on the question asked: "maximize this score" produces structurally-correct notes that physicians reject (Spearman rho = -0.077, 95% CI [-0.40, 0.26], n = 36); "did this specific fabrication stop?" produces rater-invariant deployment decisions. Second, in our pipeline, assembly-stage agents did not respond to prompt optimization the way reasoning agents did: four consecutive optimization attempts produced 18-28 point regressions. Third, physician preference is rater-fragile at typical clinical-AI calibration sample sizes (Cohen's kappa = 0.028 between two board-certified physicians, 95% CI [-0.30, 0.36] on n = 35 overlapping pairs). Fourth, the architectural punchline: six weeks after the prediction, the LLM call at the chart-assembly step was replaced with a deterministic renderer (sub-500-character template plus sandboxed scripting), lifting the defect-free rate on a 51-case holdout from 49% to 84%. We introduce a Pareto-with-absolute-floors acceptance rule (multi-axis commit with severity-class categorical vetoes) as a methodological contribution distinct from scalar-reward acceptance in standard prompt-optimization frameworks. Cross-iteration rejection memory prevents the loop from re-proposing edits already rejected three or more times. A reproducibility bundle (anonymized ablation per-case counts, bootstrap-CI data, analysis scripts) is released under CC BY 4.0 at github.com/sayvant/SQS-Auditor-paper-data.

13
AI Decision Support for Challenging Teledermatology Cases: MedGemma Performance in the Dermatology ECHO Program

Appiagyei, J. B.; Otu, R. O.; Henry, M. K.; Casterline, B. W.; Becevic, M.

2026-05-26 health informatics 10.64898/2026.05.21.26353523 medRxiv
Top 3%
0.0%
Show abstract

Teledermatology expands access to dermatologic expertise in rural settings, yet diagnostic uncertainty persists in low-resource primary care. This retrospective study evaluated MedGemma-4B-IT, a compact multimodal vision-language model, as adjunctive clinical decision support for challenging diagnostic cases. We analyzed 77 zero-concordance cases (360 clinical photographs) from a Dermatology Extension for Community Healthcare Outcomes (ECHO) tele-mentoring program (2016-2021). Zero-concordance cases showed no overlap between primary clinician provisional diagnosis and dermatologist-confirmed diagnosis. The model was prompted using dermatologist-style format to generate ranked differential diagnoses. Performance was assessed using strict case-level top-k exact-match accuracy and relaxed matching criteria based on fuzzy string similarity. MedGemma achieved 0.0% strict top-1 accuracy, 1.3% top-3 accuracy, 3.9% top-5 accuracy, and 3.9% top-10 accuracy. Relaxed concept-level matching achieved 28.6% top-1, 63.6% top-5, and 67.5% top-10 accuracy. Image-level accuracy was 44.2% (159/360, 95% CI 39.0-49.5%). The model surfaced the correct diagnosis within differential lists in 45.5% of cases despite no exact top-1 matches, suggesting utility for differential expansion rather than definitive diagnosis. Performance varied across diagnostic categories, with highest accuracy in Other categories (54.5%) and lowest in neoplastic conditions (0.0%). Common errors included confusion between inflammatory and other diagnostic groupings. These findings characterize MedGemma performance on real-world teledermatology cases and inform safe, clinician-in-the-loop integration into teledermatology workflows where specialist oversight remains essential.

14
An ECG foundation model for generalizable cardiac function prediction across the lifespan

Yang, Y.; Peracchio, L.; Mayourian, J.; Miller, T.; La Cava, W.

2026-05-27 health informatics 10.64898/2026.05.26.26354128 medRxiv
Top 3%
0.0%
Show abstract

Background Artificial intelligence-enhanced electrocardiography (AI-ECG) enables scalable, low-cost cardiac dysfunction screening, but existing models are annotation-intensive and predominantly adult-derived, leaving paediatric generalizability uncertain. Paediatric cohorts exhibit highly variable cardiac morphology and function compared to adults, which may be useful for learning generalizable AI-ECG models. Methods We pretrained ECG-Fyler on a predominantly paediatric, all-age cohort at Boston Children's Hospital (1992-2023), annotated with a cardiology-specific coding system (Fyler codes), and evaluated it on assessments from echocardiography (echo) and cardiac magnetic resonance (CMR) studies. We validated on an external adult cohort from Columbia University Irving Medical Center. Performance was benchmarked against several AI-ECG foundation models by AUROC across age groups, lesion types, and limited-data scenarios. Findings The pretraining cohort comprised 782,138 ECGs from 255,271 patients (median age: 10.9 years, IQR: [2.8-16.8]). Internal evaluation included 178,495 ECG-echo pairs (median age: 10.9 [3.7-17.0]) and 8,584 ECG-CMR pairs (median age: 20.7 [15.6-29.6]). External validation included 82,543 ECG-echo pairs from adults (median age: 64.0 [52.0-74.0]). ECG-Fyler improved AUROC across biventricular dysfunction and dilation tasks, with the largest gains in low-data settings. In internal validation, ECG-Fyler detected low left ventricular ejection fraction (LVEF [&le;] 40%) from only 100 fine-tuning samples (AUROC: 0.80, 95% CI: [0.78-0.80]), outperforming other models (AUROC < 0.65) and improving with additional fine-tuning (AUROC: 0.94 [0.93-0.94]). Similar improvements were observed for CMR-derived LVEF, RVEF, and ventricular dilation. In external validation on adults, ECG-Fyler exhibited an AUROC of 0.83 (CI: [0.82-0.85]) for LVEF [&le;] 40%. After fine-tuning on less than 10% of external data, LVEF [&le;] 45% performance (AUROC: 0.87 [0.86-0.88]) outperformed a fully trained, site-specific prior model (AUROC: 0.85 [0.84-0.87]). Interpretation Pretraining on richly annotated, paediatric-dominant ECGs yields models that transfer efficiently across institutions and ages, supporting AI-ECG screening and triage when labels or imaging access are limited. Funding National Institutes of Health (R01LM012973); Kostin Innovation Fund, Boston Children's Hospital

15
Patient Versus Prediction-Level Evaluation of a Dynamic Clinical Prediction Model of Sepsis

Tuttle, M.; Maas, C. C. H. M.; An, J.; Wessler, B. S.; Harvey, W. F.; Selker, H. P.; van Klaveren, D.; Kent, D. M.

2026-05-27 health systems and quality improvement 10.64898/2026.05.26.26354141 medRxiv
Top 3%
0.0%
Show abstract

The Epic Sepsis Model version 2 (ESMv2) is a prediction model embedded into the electronic medical record used to warn clinicians which hospitalized patients are at risk for sepsis. We conducted a retrospective cohort study of 31,951 hospitalizations of 25,760 patients to compare analyses conducted at the commonly used patient-level (where a maximum prediction prior to the onset of sepsis is used to measure performance) vs novel prediction-level (where each prediction is used to measure performance). Sepsis, defined by the Sepsis 3 criteria occurred during 1,049 hospitalizations (3.3%). Patient-level analyses suggested excellent discrimination AUC 0.86; [IQR 0.85, 0.87], whereas prediction-level analyses demonstrated lower performance AUC 0.62; [IQR 0.57, 0.65]. Low estimates of the positive predictive value (14.5% at the patient level vs 4% at the prediction level) imply a high number of false alerts. Common evaluation approaches may overstate the performance of dynamic prediction models and mislead clinical decision-making.

16
Vaginal Antisepsis for Major Gynecologic Surgeries Using Chlorhexidine Gluconate versus Povidone Iodine: A Systematic Review and Meta-Analysis

Dias, Y.; Gebrekidan, F.; Lowder, J.; Sutcliffe, S.; Yaeger, L.

2026-05-27 obstetrics and gynecology 10.64898/2026.05.26.26353429 medRxiv
Top 3%
0.0%
Show abstract

ABSTRACT OBJECTIVE: We performed a systematic review and meta-analysis (SRMA) of post-surgical outcomes, comparing chlorhexidine gluconate (CHG) versus povidone iodine (PI) for vaginal antisepsis of major gynecologic procedures. DATA SOURCES: Ovid Medline, Embase, Scopus, Embase, Cochrane, and Clinicaltrials.gov were searched between 1986 and December 2023, for studies comparing CHG with PI for vaginal antisepsis of major gynecologic operations. STUDY ELIGIBILITY CRITERIA: We included Randomized Controlled Trials (RCTs) and non-RCTs comparing CHG to PI for vaginal antisepsis of major gynecologic operations. The primary outcome was surgical site infections (SSIs) and the secondary outcome was urinary tract infections (UTIs) and vaginal irritation. METHODS: Summary estimates were calculated by fixed effects models when I2 [&le;] 25% and by random effects models when I2 > 25%. Statistical analysis was performed using RevMan 5.4.1. The protocol for this systematic review was registered on PROSPERO (ID CRD42022378101). RESULTS: Nine studies met the inclusion criteria, four of which were randomized controlled trials (RCTs). 9538 patients were included, 4300 (45%) of whom were allocated to CHG and 5238 (55%) to PI. No statistically significant difference in SSI incidence was found for vaginal antisepsis with CHG versus PI in pooled analyses (n= 9538 patients; RR 1.20; 95% CI 0.92-1.57; I2 =0%). In contrast, a significantly higher risk of UTIs was observed for vaginal antisepsis with CHG than with PI (n=6061 patients; RR 1.48 95% CI 1.03-2.14; I2 = 0%). CONCLUSION: In our SRMA, there were no significant differences in SSI risk when either CHG or PI was utilized for antiseptic vaginal preparation. Interestingly, vaginal antisepsis with PI was associated with a lower incidence of post-operative UTIs following major gynecologic surgery. Our findings support current guidelines that form of vaginal antisepsis can be used for SSI prevention. They also suggest that PI may result in fewer postoperative UTIs but further randomized studies are needed to support these findings. Key words: surgical site infection, surgical wound infection, urinary tract infection, urogynecologic surgery, Chlorhexidine, Povidone Iodine, surgical antiseptic,

17
Optical coherence tomography as a biomarker for frontotemporal dementia: a systematic review & meta-analysis

Wang, E.; Kohli, A.; Taha, H. B.

2026-05-27 neurology 10.64898/2026.05.19.26353366 medRxiv
Top 3%
0.0%
Show abstract

Background: Frontotemporal dementia (FTD) lacks widely accessible disease-specific biomarkers. Optical coherence tomography (OCT) and OCT angiography (OCTA) may provide non-invasive measures of retinal changes associated with neurodegeneration. We conducted a systematic review and meta-analysis evaluating retinal biomarkers in FTD compared with Alzheimer disease (AD) and controls. Methods: A systematic search of PubMed and Embase was conducted through April 25, 2026 according to PRISMA guidelines. Studies evaluating OCT/OCTA biomarkers in FTD with comparator groups were included. Inverse weighted random-effects models, publication bias assessments, and meta-regressions were performed. Results: Ten studies involving 139 individuals with FTD, 87 with AD, 29 with mild cognitive impairment, 14 with TDP-43 proteinopathy, 5 with tauopathy, and 255 controls were included in the systematic review; five studies were eligible for meta-analysis. Compared with AD, individuals with FTD demonstrated significantly thinner retinal nerve fiber layer (RNFL) thickness (SMD = -0.61, 95% CI -0.98, -0.24). Compared with controls, individuals with FTD exhibited significantly thinner ganglion cell layer-inner plexiform layer (GCL-IPL) thickness (SMD = -0.55, 95% CI -1.02, -0.08), whereas pooled analyses across multiple retinal biomarkers were non-significant (SMD = -0.19, 95% CI -0.52, 0.14). RNFL thickness correlated negatively with female % in FTD and positively with age in both AD and controls. Conclusions: Individuals with FTD exhibit lower RNFL thickness than AD and lower GCL-IPL thickness than controls, suggesting retinal alterations may reflect neurodegeneration. However, larger longitudinal studies with standardized OCT/OCTA protocols are needed to determine the diagnostic and prognostic utility of retinal biomarkers in FTD

18
Morphological feature remodeling of intracranial arteries in the context of inflammation and HIV-associated cognitive impairment

Hoang, N.; Yang, H.; Uddin, M. N.; Zhong, J.; Faiyaz, A.; Singh, M. V.; Boodoo, Z. D.; Sutton, K. R.; Wang, H. Z.; Sahin, B.; Khan, M. W.; Weber, M. T.; Yuan, C.; Chen, L.; Schifitto, G.

2026-05-27 hiv aids 10.64898/2026.05.19.26353071 medRxiv
Top 3%
0.0%
Show abstract

Background: Despite the success of combination antiretroviral therapy (cART), vascular comorbidities, including cerebrovascular disease, are more prominent in people living with HIV (PLWH) compared to people without HIV (PWOH). However, quantitative assessments of cerebrovascular morphometry and their associations with cognitive outcomes in the context of HIV are still limited. In this study, we explore this missing link. Methods: Magnetic Resonance Angiography (MRA) data, blood markers, and neurocognitive assessments were collected from 73 PWOH subjects (male: 57, female: 16; age: 53 {+/-} 16) and 99 PLWH subjects (male: 66, female: 30, age: 53 {+/-} 11). Vessel morphometric features were quantified using intraCranial Artery Feature Extraction (iCafe) to investigate associations between vessel morphometry, markers of monocytes, endothelial cell activation, and cognitive performance. Results: HIV status predicted a lower total number of branches ({beta} = -0.224, p = 0.001, d = -0.517) and shorter total distal length ({beta} = -0.173, p = 0.021, d = -0.370) with a moderate effect size. Total branch number was found to be negatively associated with plasma levels of monocyte markers (sCD14: r = -0.167, p = 0.033; sCD163: r = -0.157, p = 0.045) and positively correlated with white matter cerebral blood flow (r = 0.550; p [&le;] 0.05). HIV status was the strongest predictor of overall cognitive performance in ANCOVA model ({beta} = -0.219, p = 0.006, d = -0.453). Conclusions: Our results suggest that cognitive impairment in PLWH is associated with vessel morphology metrics. Monocyte immune activation may contribute to changes in vessel morphology.

19
Core Components for Emergency Medical Dispatch Systems: An International Delphi Consensus Study

Weber, K.; Stassen, W.; Jayaraman, S.; Odland, M. L.; Nishimwe, A.; Welgama, I.; Wallis, L.; Ignatowicz, A.; Davies, J. P.

2026-05-28 emergency medicine 10.64898/2026.05.26.26354117 medRxiv
Top 3%
0.0%
Show abstract

Introduction -- Emergency Medical Dispatch Systems (EMDS) can reduce delays in accessing emergency care by providing structured communication, triage, and coordination. However, such systems remain absent or underdeveloped in most low- or middle-income countries (LMICs). This study aimed to establish international consensus on essential EMDS components to inform global guidance. Methods -- We convened a multidisciplinary expert group to draft a preliminary list of essential components for three EMDS levels reflecting resource availability and system maturity. We then conducted a three-round Delphi with international experts to reach consensus on core EMDS components. Components which had [&ge;]75% agreement were included, those with [&ge;]75% disagreement were excluded. Components not achieving consensus by Round 3 were removed. Results were analysed overall and stratified by respondents' country income level. A subsequent online expert meeting resolved inconsistencies and finalised the component list. Results -- The expert group generated 111 components for each of three EMDS levels (Foundational, Emerging, and Established) spanning 11 operational domains. Of the 68 experts invited to the Delphi, 43 participated in Round 1 and 30 in Round 3. Across all Delphi rounds, 289 components reached consensus for inclusion. The consensus resulted in a final list of 227 components (63 Foundational, 84 Emerging, and 80 Established). Consensus agreement clustered around core EMDS domains including communication, structured call-taking and prioritisation, advice-giving, resource dispatch and tracking, and foundational governance and data functions, whereas items showing either non-consensus or consensus disagreement were typically technology-dependent or context-specific. Conclusions -- This international consensus offers guidance for EMDS development across diverse resource settings and provides a scalable roadmap to strengthen emergency care systems.

20
ERBB4 deficiency promotes atrial myopathy underlying the atrial fibrillation substrate

Yamaguchi, N.; Santucci, J.; Hong, S. J.; Ferrena, A.; Schlamp, F.; Willett, D.; Casdin, C. J.; Park, P. S.; Lin, X.; Xiao, J.; Hall, S.; Barnard, J.; Achter, J.; Kanhert, K.; Lundby, A.; Chung, M. K.; Van Wagoner, D. R.; Park, D. S.

2026-05-27 cardiovascular medicine 10.64898/2026.05.26.26354173 medRxiv
Top 3%
0.0%
Show abstract

Background Atrial fibrillation (AF) is a leading cause of stroke, cardiovascular morbidity, and mortality. Atrial myopathy, characterized by progressive metabolic, electrical, and structural changes, creates the arrhythmogenic substrate that drives AF. Defining the key drivers of atrial myopathic processes is essential for targeted therapies that can mitigate AF progression. Here we explore how reduced ERBB4 expression contributes to the development of left atrial myopathy. Methods We analyzed the Cleveland Clinic Biobank to compare left atrial ERBB4 levels in patients grouped by AF diagnosis. To investigate the impact of reduced ERBB4 levels on atrial tissue substrate, we created mouse models of cardiac-specific Erbb4 deficiency using Mlc2a (myosin light chain 2a)-Cre. Comprehensive physiological assessments were performed. Transcriptomic analyses of the left atrium were performed in an Erbb4 haploinsufficient mouse model and compared with human atrial datasets. Molecular validation of key dysregulated pathways was performed. Results We found that left atrial ERBB4 levels are reduced in patients with AF. Adult cardiomyocyte-specific Erbb4 heterozygous (Erbb4fl/+;Mlc2a-Cre) mice exhibited prolonged P-wave duration in the absence of ventricular dysfunction. Left atrial transcriptomic analysis in Erbb4 haploinsufficient mice showed upregulation of pathways related to fibrosis, apoptosis, and coagulation, and downregulation of pathways related to fatty acid metabolism and mitochondrial function, mirroring changes observed in pressure overload mouse models. A cross-species transcriptomic comparison revealed significant overlap between ERBB4-correlated gene expression and functional pathways in adult human atria and mice with Erbb4 haploinsufficiency. Validating the transcriptomic data, protein and functional assays demonstrated increased fibrosis, apoptosis, and oxidative stress in the mutant left atrial tissue. Conclusion Left atrial ERBB4 levels are reduced in AF patients. A mouse model of Erbb4 deficiency and human atrial transcriptomic analyses highlight a role for ERBB4 in supporting normal atrial metabolism while protecting against inflammation, apoptosis, and fibrosis.